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The 3' end of the turkey coronavirus (TCV) genome 
and the gene encoding the nucleocapsid protein (N) 
were cloned and sequenced. The gene encoding the 
membrane protein (M) was obtained by cloning a 
polymerase chain reaction (PCR)-amplified fragment 
obtained using bovine coronavirus (BCV)-specific 
primers. Furthermore, five TCV DNA fragments, 
obtained by PCR on RNA from clinical specimens and 
corresponding to either the N terminus of the M 
protein or the complete M protein were also cloned and 
sequenced. The sequence revealed a 3' non-coding 
region of 291 bases, an open reading frame (ORF) 
encoding the N protein with a predicted size of 448 
amino acids, or an M t of 49K, and an ORF encoding 
the M protein with a predicted size of 230 amino acids 
and an M r of 26K. A third ORF, encoding a 


hypothetical protein of 207 amino acids with an M r of 
23K was found within the N gene sequence. The amino 
acid sequences of both the N and M proteins were more 
than 99% similar to those published for BCV. 
Extensive similarity was also observed between the 
amino acid sequences of the TCV N protein and those 
of murine hepatitis virus (MHV) (70%) and human 
respiratory coronavirus strain OC43 (HCV-OC43) 
(98%) and between the amino acid sequences of the 
predicted M proteins of TCV and MHV (86%). Such 
striking identity suggests that BCV, TCV and HCV- 
OC43 must have diverged from each other only 
recently. A potential A glycosylation site was found at 
the N terminus of the TCV M protein and is situated at 
the same location in BCV, MHV and transmissible 
gastroenteritis virus. 


Introduction 

The Coronaviridae family contains four antigenic groups 
(Pederson et al., 1978; Sturman & Holmes, 1983). 
Viruses within each group possess partial antigenic 
cross-reactivities and infect a variety of mammalian and 
avian species (Siddell et al., 1983). The viruses possess a 
single-stranded, polyadenylated RNA genome of about 
20 to 30 kb. The genes encoding the viral structural 
proteins are situated on the last quarter of the 3' end of 
the genome. Except for the nucleocapsid protein (N), all 
other structural proteins so far identified are associated 
with the lipid membrane. The integral membrane 
protein (M), which is largely embedded in the lipid 
bilayer targets the site of virus morphogenesis (Tooze et 
al., 1984) and may be implicated in viral pathogenesis 
(Fleming et al., 1989), whereas the bulbous peplomer (S) 
protein is responsible for virus binding (Cavanagh & 
Davis, 1986; Koch et al., 1990) as well as virulence and 
tissue tropism (Wege et al., 1988). An additional surface 
protein (HE), responsible for haemagglutination, has 
been found in bovine coronavirus (BCV; King et al.. 


1985; Hogue et al., 1989; Parker et al., 1989), human 
respiratory coronavirus strain OC43 (HCV-OC43; Ho¬ 
gue & Brian, 1985), haemagglutinating encephalitis virus 
of swine (Callebaut & Pensaert, 1980), diarrhoea virus of 
infant mice (Sugiyama et al., 1986) and turkey corona¬ 
virus (TCV; Dea et al., 1986). The HE protein of BCV 
also exhibits an acetyl esterase receptor-destroying 
activity similar to the activity found in influenza C 
viruses (Vlasak et al., 1988). 

Our recent studies demonstrated a close antigenic 
relatedness between TCV and BCV; only a few 
monoclonal antibodies produced against either of the 
two viruses were able to differentiate between them, 
indicating that TCV, which is still placed in an antigenic 
group distinct from avian infectious bronchitis virus 
(IBV) and the mammalian coronaviruses, should be 
reclassified (Dea et al., 1990). Homology between BCV 
and TCV was further established in hybridization assays 
(unpublished results). It was demonstrated that BCV- 
specific probes were efficient in detection and clinical 
diagnosis of TCV. In order to determine the extent of 
homology between the two viruses, we cloned and 
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sequenced the genes encoding the N and M proteins. 
One of the structural differences observed between TCV 
and BCV is the type of glycosylation of the M protein, 
which is N- and O-glycosylated in TCV and BCV, 
respectively (Dea et al., 1989a; Lapps et al., 1987). We 
therefore used the polymerase chain reaction (PCR) on 
nucleic acid isolated from TCV-positive clinical speci¬ 
mens to obtain the M gene or gene fragments corre¬ 
sponding to the N-terminal portion of the M protein. 
These fragments were cloned and sequenced to establish 
possible sequence differences associated with the pre¬ 
dicted glycosylation sites and to confirm the reliability of 
the obtained TCV sequence. 
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Methods 

Virus and cells. The prototype Minnesota strain of TCV (TCV- 
Minnesota) (Ritchie et al., 1973), kindly supplied by Dr B. S. Pomeroy 
(College of Veterinary Medicine, St Paul, Mn., U.S.A.), was initially 
serially propagated by inoculation into the amniotic cavity of 22- to 24- 
day-old embryonating turkey eggs and further propagated on HRT-18 
cells in the presence of 1 unit/ml bovine pancreatic trypsin as described 
earlier (Laporte et al., 1980; Dea et al., 1989b). 

Synthesis and cloning of cDNA. Purified, tissue culture-adapted TCV- 
Minnesota was used for the extraction of RNA (Verbeek & Tijssen, 
1988), which was reverse transcribed according to standard procedures 
(Binns et al., 1985; Gubler & Hoffman, 1983). Tailing (Roychoudhury 
& Wu, 1980) and cloning of cDNA molecules for the construction of a 
genomic library was as described for BCV (Verbeek & Tijssen, 1988). 

Clone selection and DNA sequencing. Clones from the TCV cDNA 
library were screened in duplicate by colony hybridization assays 
(Grunstein & Hogness, 1975) with one probe (a 32 P-labelled recombin¬ 
ant plasmid; Rigby et al., 1977) containing sequences corresponding to 
the 3' end of the BCV genome as well as a part of the N gene, and 
another probe containing the additional upstream sequences of the 
BCV N gene (Verbeek et al., 1990). This approach was chosen because 
32 P-labelled BCV-specific recombinant plasmids were capable of 
detecting many TCV isolates and TCV present in clinical samples 
(unpublished results). Clones that hybridized with both probes were 
selected for further characterization. Two Pjtl-generated insert 
fragments of recombinant plasmid pM78 were subcloned into 
replicative form DNA of bacteriophage M13mpl9, while phage clones 
with opposite insert orientations, determined according to Poncz et al. 
(1982), were subjected to exonuclease Ill/nuclease SI degradation 
(Henikoff, 1984) to create clones with a nested set of deletions. The 
TCV M gene was obtained by cloning a fragment amplified by PCR 
using BCV-specific primers [PXBAV (5' GAA CAT TTC TAG ATT 
GGT CGG ACT G 3') reverse complementary to the sequence located 
1527 to 1551 nucleotides from the 3' end and PC (5' ATG AGT AGT 
GTA ACT AC A CCA GCA 3') hybridizing to nucleotides 2314 to 2337 
from the 3' end] and TCV-Minnesota genomic RNA. The insert from 
recombinant plasmid pMEl was subcloned in M13mpl9 and se¬ 
quenced in both directions as described above for pM78. Sequencing 
was according to the method of Sanger et al. (1977). Sequences were 
analysed and compared with the 1BI Pustell sequence programs. 

Amplification by PCR using RNA isolated from TCV-positive clinical 
specimens. The supernatant (100 pi) of clarified intestinal contents was 
supplemented with 1 pg of tRNA (Sigma) before RNA extraction 
(Chomczynski & Sacchi, 1987). RNA was reverse-transcribed as 


Fig. 1. Strategy used to sequence the N and M genes of TCV- 
Minnesota and the M gene of a TCV Quebec isolate. pM78 and pMEl 
represent plasmids containing cDNA inserts of 1-7 and 0-81 kbp, 
corresponding to the N and M genes, respectively. pQE7 contains a 
PCR-amplified fragment, corresponding to the M gene of TCV Quebec 
isolate number 6. All inserts were also subcloned in M13 mpl9, 
analysed for their orientation and subjected to unidirectional deletion. 
The arrows represent sequences obtained from the deleted-insert 
clones. 


described earlier (Verbeek & Tijssen, 1990) using a primer comple¬ 
mentary to BCV RNA. PCR was performed on cDNA 
templates with BCV sequence-specific primer combinations: (i) 
PIORF1 (5' GG GGGATCC TTA CAC CAG AGG TAG GGG TTC 
3', reverse complementary to the sequence located 951 to 971 
nucleotides from the 3' end) and PIORF2 (5' GG AAGCTT ATG GCA 
TCC TTA AGT GGG CCG, complementary to the sequence 1554 to 
1574 nucleotides from the 3' end) to amplify the N-intemal open 
reading frame (ORF) of 624 bp (including the translation stop codon; 
detecting a fragment of 640 bp containing the primer sequences), (ii) 
PE1E (5' GG AAGCTT ATG AGT AGT GTA ACT ACA CCA 3', 
complementary to the sequence 2317 to 2337 nucleotides from the 3' 
end) and PE1F (5' GG GGATCC TTA GAT ATT ATT TCT CAA 
CAA T 3', reverse complementary to the sequence located 1645 to 1666 
nucleotides from the 3' end) to amplify the 693 bp (including the 
translation stop codon) TCV M gene (709 bp, including the primer 
sequences) and (iii) PE1E and PE1G (5' GG GAGCTC TAA GAT 
GAT AGT AAG GGG CCA 3', reverse complementary to the 
sequence located 2131 to 2151 nucleotides from the 3' end) to amplify 
207 bp fragments (223 bp, including the primer sequences) encoding 
the N terminus of the TCV M protein. Underlined primer sequences 
represent non-viral sequences containing restriction endonuclease 
sites. PCR was for 30 cycles under conditions described earlier 
(Verbeek & Tijssen. 1990) and in the presence of 0-5 pi of [a- 32 P]dCTP 
(ICN; 3000 Ci/mmol, 3-3 pM) as a tracer for the amplified fragments. 


Results 

cDNA cloning and clone selection 

Clone pM78, selected by colony hybridization, contained 
an insert of about 1 -65 kbp, corresponding to the 3' end of 
the TCV genome. The insert was subcloned in Ml 3mpl9 
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PEI E 


ATG AGT ACT GTA ACT ACA CCA GCA CCA GTT TAC ACC TGG ACT CCT GAT GAA GCT ATT AAA TTC CTA AAG GAA TGG AAC TTT TCT TTG GGT ATT ATA CTA CTT TTT ATT ACA ATC ATA TTG 
MSSVTTPAPVYTUTADEAIKFLKEWNFSLGI1LLF1T11L 

_ PE1G _ 

* * 0 * * * ■* * • * * * 240 

CAA TTT GGA TAT ACA AGT CGC AGT ATG TCT GTT TAT GTT ATT AAG ATG ATC ATT TTG TGG CTT ATG TGG CCC CTT ACT ATC ATC TTA ACT ATT TTC AAT TGC GTG TAT GCG TTG AAT AAT 

QFGYTSRSMSVYVlKMIILWlMWPLTIllTIFNCVYALNN 

* * * * * * * * * **360 

GTG TAT CTT GGC TTT TCT ATA GTT TTC ACT ATA GTG GCC ATT ATC ATG TGG ATT GTG TAT TTT GTG AAT AGT ATC AGG TTG TTT ATT AGA ACT GGA AGT TGG TGG AGT TTC AAC CCA GAA 

VYLGFSIVFT1VA1 IMWJVYFVNSIRLF [RTGSWWSFNPE 

* * * . * • . * * **480 

ACA AAC AAC TTG ATG TGT ATA GAT ATG AAG GGA AGG ATG TAT GTT AGG CCG ATA ATT GAG GAC TAC CAT ACC CTT ACG GTC ACA ATA ATA CGT GGT CAT CTT TAC ATG CAA GGT ATA AAA 

TNNLMC1DMKGRMYVRPI IE0YHTITVT I IRGHIYHQG1K 

* * * * * * * * * **600 

CTA GGT ACT GGC TAT TCT TTG TCA GAT TTG CCA GCT TAT GTG ACT GTT GCT AAG GTC TCA CAC CTG CTC ACG TAT AAG CGT GGT TTT CTT GAC AAG ATA GGC GAT ACT AGT GGT TTT GCT 

LGTGYSLSDLPAYVTVAKVSHLLTYKRGFLDK1GDTSGFA 

_ PEIF _ 

. * * * * * * * * **720 

GTT TAT GTT AAG TCC AAA GTC GGT AAT TAC CGA CTG CCA TCA ACC CAA AAG GGT TCT GGC ATG GAC ACC GCA TTG TTG AGA AAT A AT ATC TAA ACT TTA AGG ATG TCT TTT ACT CCT GGT 

VYVKSKVGNYRLPSTQKGSGMDT ALLRNNl MSFTPG 

._ PIQRF2 _ _ PXBAV _. 

* * 0 * * * • * ” * * * 840 

AAG CAA TCC AGT AGT AGA GCG TCC TCT GGA AAT CGT TCT GGT AAT GGC ATC CTT AAG TGG GCC GAT CAG TCC GAC CAA TCT AGA AAT GTT CAA ACC AGG GGT AGA AGA GCT CAA CCC AAG 

KQSSSR ASSGMRSGKGl tKUAOGSOOSRNVQTRGRRAQPIC 

MASISG PISPTNLEMFKPGVEELNPS 

* Q * * . • * * * ** 960 

CAA ACT GCT ACT TCT CAG cSa CCA TCA GGA GGG AAT GTT GTA CCC TAC TAT TCT TGG TTC TCT GGA ATT ACT CAG TTT CAA AAA GGA AAG GAG TTT GAA TTT GCA GAG GGA CAA GGT GTG 

OT ATSQQPSGGNVVPYYSWFSG I TOFDKGKE FEF AEGQGV 
KLLlLSNHQEGMLYPT I IGSIELLSFKKERSINLQRDKVC 

* ** *** * * * ** 1080 

CCT ATT GCA CCA GGA GTC CCA GCT ACT GAA GCT AAG GGG TAC TGG TAC AGA CAC AAC AGA CGT TCT TTT AAA ACA GCC GAT GGC AAC CAG CGT CAA CTG CTG CCA CGA TGG TAT TTT TAC 

piapgvpateakgywyrmnrrsfictadgnorollprwyfy 

LLHOESOILKIRGTGTDTT0VLLK0PMATSVMCCHDG1FT 

*** ..* **. ** 1200 

TAT CTT GGA ACA GGA CCG CAT GCC AAA GAC CAG TAT GGC ACC GAT ATT GAC GGA GTC TTC TGG GTC GCT AGT AAC CAG GCT GAT GTC AAT ACC CCG GCT GAC ATT CTC GAT CGG GAC CCA 

YLGTGPHAKDQYGTDIDGVFWVASNQADVNTPADILDROP 
ILEQORMpICTSNAPILTESSGSLVfTRLMSIPRLTFSIGTO 

**• * « .*. *. 1320 

AGT AGC GAT GAG GCT ATT CCG ACT AGG TTT CCG CCT GGC ACG GTA CTC CCT CAG GGT TAC TAT ATT GAA GGC TCA GGA AGG TCT GCT CCT AAT ICC AGA TCT ACT TCA CGC GCA TCC AGT 

SSOEAl PTRF PPGTVLPOGYY I EGSGRSAPNSRST SRASS 
VAMRIFRIGFRLARYSLRVTILKAQEGLLLIPDLIHAHPV 


AGA GCC TCT AGT GCA GGA TCG CGT AGT AGA GCC AAT TCT GGC AAC AGA ACC CCT ACC TCT GGT GTA ACA CCT GAT ATG GCT GAT CAA ATT GCT AGT CTT GTT CTG GCA AAA CTT GGC AAG 
RASSAGSRSRANSGNRTPTSGVTPDMAOQIASLVLAKLGK 
EPLVQDRVVEP1LATEPLPLV 

* .* *.. .** ** 1560 

GAT GCC ACT AAG CCA CAG CAA GTA ACT AAG CAG ACT GCC AAA GAA ATC AGA CAG AAA ATT TIG AAT AAG CCC CGC CAG AAG AGG AGC CCC AAT AAA CAA TGC ACT GTT CAG CAG TGT TTT 

OATKPQOVTKQTAKEIRGKILNFPRQKRSPNKOCTVOflCF 

**. * * * * * * ** 1680 

GGG AAG AGA GGC CCC AAT CAG AAT TTT GGT GGT GGA GAA ATG TTA AAA CTT GGA ACT AGT GAC CCA CAG TTC CCC ATT CTT GCA GAA CTC GCA CCC ACA GCT GGT GCC TTT TTC TTT GGA 

GKRGPNQNFGGGEMLKLGTSDPQFPILAELAPTAGAFFFG 

. . * .*. • ~* * ** 1800 

TCA AGA TTA GAG TTG GCC AAA GTG CAG AAT TTG TCT GGG AAT CTT GAC GAG CCC CAG AAG GAT GTT TAT GAA TTG CGC TAT AAT GGT GCA ATT AGA TTT GAC ACT ACA CTT TCA GGT TTT 

SRIEIAKVONISGNLDEPOKOVYELRYNGAIRFDSTISGF 

*** * * * *** ** 1920 

GAG ACC ATA ATG AAG GTG TTG AAT GAG AAT TTG AAT GCA TAT CAA CAA CAA GAT GGT ATG ATG AAT ATG AGT CCA AAA CCA CAG CGT CAG CGT GGT CAG AAG AAT GGA CAA GGA GAA AAT 

6 7 1MKVL NENL N A YOOOOGMHKHSPKPQRORGOKNGQGEM 

. ** *** * * * *. 2040 

GAT AAT ATA AGT GTT GCA GCG CCT AAA AGC CGT GTG CAG CAA AAT AAG AGT AGA GAG TTG ACT GCA GAG GAC ATC AGC CTT CTT AAG AAG ATG GAT GAG CCC TAT ACT GAA GAC ACC TCA 

dnis-vaapksrvqqnksreltaed isllkkndepytedts 

. . . * ... * * 2160 

GAA ATA TAA GAG AAT GAA CCT TAT GTC GGC ACC TGG TGG TAA GCC CTC GCA GGA AAG TCG GGA TAA GGC ACT CTC TAT CAG AAT GGA TGT CTT GCT GCT ATA ATA GAT AGA GAA GGT TAT 

E I 

... ... ... .. 2280 

AGC AGA CTA TAG ATT AAT TAG TTG AAA GTT TTG TGT GGT AAT GTA TAG TGT TGG AGA AAG TGA AAG ACT TGC GGA AGT AAT TGC CGA CAA.GTG CCC AA G GGA AGA GCC AGC ATG TTA AGT 

TAC CAC CCA GTA ATT AGT AAA TGA ATG AAG TTA ATT ATG GCC AAT TGG AAG AAT CAC 


Fig. 2. cDNA sequence of the first 2337 nucleotides of the 3' end of the TCV genome. Predicted amino acid sequences are shown for 
three ORFs, corresponding to the M and N genes and a reading frame inside the N gene. Nucleotide differences between the BCV and 
TCV sequences are indicated in circles above the sequence of TCV. The intergenic consensus region between the N and M genes and 
the 3' conserved 10 base sequence are underlined. Arrows correspond to the locations of primers used in PCR amplification. 


and both strands were sequenced (Fig. 1). Sequences 
corresponding to the ORF of the M protein were 
obtained by cloning a fragment amplified by PCR using 
BCV-specific primers (PXBAV and PC) and RNA 
isolated from purified TCV-Minnesota. Clone pMEl 
was found to contain the expected insert of 811 bp and 
was subcloned for sequencing (Fig. 1). 


Sequence analysis of TCV-Minnesota cDNA clones 

The nucleotide sequence of the 3' end of the TCV 
genome, i.e. the N and M genes, and their predicted 
amino acid sequence are shown in Fig. 2. A non-coding 
region of 291 bases excluding the poly(A) tail was found 
at the 3' end of the genome and contains a 10 base 
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Nucleotides 

Fig. 3. Schematic design of the location of ORFs obtained when 
translating three frames of the 2337 nucleotides located at the 3' end of 
TCV genomic RNA. Vertical lines in the translated frames represent 
termination codons, while lines in the ‘MET’ rectangles represent 
methionine codons that could serve as translation initiation sites. 

consensus region, GGGAAGAGCC, at 70 to 79 bases 
from the 3' end. The location and sequence of this region 
is similar to the consensus regions found in murine 
hepatitis virus (MHV) and IB V (Boursnell et al., 1985) as 
well as the consensus published for porcine transmissible 
gastroenteritis coronavirus (TGEV) (Kapke & Brian, 
1986) and two different strains of BCV (Lapps et al., 
1987; Cruciere & Laporte, 1988). 


The largest translational reading frame of 1344 
nucleotides (292 to 1635 nucleotides from the 3' end; Fig. 
2 and 3) predicted a 448 amino acid protein with an M t of 
49K, which is likely to encode the N protein because of 
its location (Spaan et al., 1988) and its predicted M r 
which approaches that found for the TCV N protein 
(Dea & Tijssen, 1988). The consensus region, AUAU- 
CUAAACUUUAAGG, intergenic to N and M, was the 
same as that for BCV and resembled closely those 
observed for MHV strains A59 (Armstrong et al., 1983) 
and JHM (Skinner & Siddell, 1983), and HCV-OC43 
(Kamahora et al., 1989). 

The second largest translational reading frame (bases 
1648 to 2337 from the 3' end; Fig. 2 and 3) was predicted 
to encode a protein of 230 amino acids with an M r of 
about 26K, which is likely to be the M protein. The 
predicted protein has 113 hydrophobic residues (approx¬ 
imately 49% hydrophobicity) with a distribution similar 
to the BCV and MHV hydrophobic amino acids. The 
first 28 N-terminal amino acid residues contain six 
potential sites for O- and one site for A-glycosylation. 
Most basic amino acid residues (17/23) were found in the 
C-terminal half of the protein. 

An overlapping ORF (bases 951 to 1574 from the 3' 
end), predicting a protein of 207 amino acids with an M T 
of 23K was found inside the coding sequence of the N 
protein (Fig. 2 and 3). 



Fig. 4. Electrophoretic profiles (a and c) of PCR-amplified products and further identification of the fragments by autoradiography of 
the gels (b and d). (a) Lanes 1 to 8 refer to eight clinical samples and lanes 9 and 10 to third passage culture fluids of two other TCV 
isolates used to extract nucleic acid for cDNA synthesis and amplification by PCR with primers PE1E and PE1G, respectively. PCR, 
using the same combination of primers, was also applied to nucleic acid isolated from mock-infected HRT-18 cells (lane 11). The 223 bp 
amplified fragments represent gene fragments encoding the N terminus of the M protein. Samples of one-tenth of the reaction mixtures 
were analysed on the gel. Autoradiography of the dried gel ( b ) was for 2 h at — 70 °C. Amplification by PCR was done on RNA isolated 
from clinical specimens 5 and 6, using a combination of primers PE IE and PE1F to amplify the translational reading frame of the M 
gene (709 bp) (c, sample 5 in lane 1; sample 6 in lane 4). Further amplification was assayed using primers PIORF1 and 2 to amplify a 640 
bp fragment containing sequences of the translational reading frame inside the N gene. The IORF-amplified products from samples 5 
and 6 are shown in (c) and ( d ), lanes 3 and 2, respectively. Lanes 5 and 6 in (c) and (4) refer to the same 223 bp amplified fragments as in 
(a). Lane 0 contains DNA markers (bp). Autoradiography of the dried gel ( d) was for 5 h at —70 °C. 
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Fig. 5. Schematic representation and comparison of the sequences from 
cloned PCR-amplified TCV-specific fragments (see Fig. 4) with the 
corresponding sequence of the TCV-Minnesota and BCV-Mebus 
strains, respectively. Lines represent identity between the sequences of 
both viruses, whereas an asterisk refers to single nucleotide differences 
compared to the BCV sequence. CS (clinical sample) and TC (tissue 
culture) indicate the origin of the samples used for RNA extraction. 


Amplification by PCR 

The complete M gene, or gene fragments corresponding 
to the N terminus of the M protein, were amplified, 
cloned and sequenced using eight TCV-positive clinical 
specimens as starting material. Similarly, TCV-contain- 
ing culture fluid supernatants, obtained after three 
passages of virus from two different clinical samples, 
were also subjected to PCR. Agarose gel electrophoresis 
of 10% of each PCR reaction mixture showed that 
amplification occurred in two of eight clinical specimens 
(Fig. 4 a, b; lanes 5 and 6) and in both cultured isolates 
(Fig. 4a, b ; lanes 9 and 10). Autoradiography revealed 
significant background amplification in some of the 
samples (Fig. 4b ; lanes 1, 2 and 7), which is not observed 
in samples where actual amplification has occurred. 
Amplified products were absent in samples after PCR 
using nucleic acid isolated from mock-infected HRT-18 
cells (Fig. 4a, lane 11). RNA from clinical samples 5 and 
6 was also used for amplification with a combination of 
primers that would amplify the 624 bp internal ORF 
(IORF) (640 bp, including the primers) located inside the 
N gene, and the 693 bp ORF of the M protein (709 bp 
including the primers). Agarose gel electrophoresis (Fig. 
4c) of 10% of the reaction products revealed that 
amplification could only be detected after autoradio¬ 
graphy of the gel in two out of four reactions (Fig. 4 d). 
The 709 bp amplified fragment of sample 6 (Fig. 4c, d; 
lane 4) and the 223 bp fragments of clinical samples 5,6, 
9 and 10 (Fig. 4a, b), were re-amplified and cloned in 
pUC-9 after poly(C) tailing. 

Sequence analysis of cloned PCR-amplified fragments 

Comparison of the sequences of cloned PCR-amplified 
products with those of TCV-Minnesota and the BCV 


IBV . HP NETMC TLDFEQSVQL FKEYNLFITA 

* * * 

HCV-229E.. .-msndhc tgdi---vth lknwnfgwnv 


Fig. 6. Amino acid sequence comparison of the TCV 60 residue N 
terminus of the M protein with corresponding regions of other 
coronavirus strains by maximum alignment of the amino acid sequence 
of the complete M proteins. Potential IV-glycosylation sites are 
underlined; potential sites for O-glycosylation are identified by an 
asterisk. Numbering corresponds to that of TGEV. 


Mebus strain is presented schematically in Fig. 5. The 
single nucleotide difference at position 149, in the M 
genes of TCV and BCV (Fig. 2), was also found in the 
sequence of the 223 bp fragment obtained from clinical 
specimen 5. The sequences of the other 223 bp fragments 
(clinical sample 6 and those obtained from the cultured 
TCV isolates), as well as the complete M gene (709 bp 
fragment; plasmid pQE7) of TCV from clinical sample 
6, were identical to the sequence published for BCV (Fig. 
1, 2 and 5). 

Discussion 

The sequence of the first 2337 nucleotides from the 3' end 
of the TCV RNA genome revealed a 291 base non¬ 
coding region and two ORFs with positions correspond¬ 
ing to those for the N and M proteins of coronaviruses 
(Spaan et al., 1988). The 3' non-coding 291 base region 
has a 10 nucleotide sequence (GGGAAGAGCC) which 
is relatively conserved throughout the Coronaviridae 
family and may be involved in attachment of the 
polymerase to initiate negative-strand RNA synthesis 
(Spaan et al., 1988). 

The largest translational reading frame of 1344 
nucleotides was predicted to encode a 448 amino acid, 
49K protein which is likely to be the N protein. The 
predicted protein is basic and serine-rich (43/448 amino 
acids) and its serine residues tend to be clustered in two 
regions. One of these clusters is located at the N terminus 
of the protein, and the other cluster is situated between 
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amino acids 190 and 239 from the N terminus. Such 
clusters are also found with other mammalian and avian 
coronaviruses (Boursnell et al., 1985; Kapke & Brian, 
1986; Lapps et al., 1987; Kamahora et al., 1989) and 
possibly represent phosphorylation ‘hot spots’. 

The TCV N protein amino acid sequence shares 
extensive identity with the analogous sequences of BCV 
99%; Lapps et al., 1987) and HCV-OC43 (98%; 
Kamahora et al., 1989), although it is classified in a 
separate antigenic group. On the other hand, little 
similarity was found with the corresponding sequences 
of IBV (approx. 30%; Boursnell et al., 1985) and TGEV 
(approx. 30%; Kapke & Brian 1986). Furthermore, 
comparison of the N protein amino acid sequences of 
TCV with MHV, TGEV and HCV-OC43 reveals regions 
of up to 69 amino acids with significant sequence 
identity (> 90%) which may represent functional 
domains having survived evolutionary pressures. We 
showed previously that BCV probes specific to different 
regions throughout the genome were individually capa¬ 
ble of detecting TCV isolates or TCV in clinical 
specimens (unpublished results). Since homology 
between BCV and HCV-OC43 has been confirmed by 
serological studies (Hogue et al., 1984), and RNA 
fingerprinting data suggest a close resemblance between 
the remaining as yet unsequenced portions of the 
genomes (Lapps & Brian, 1985), we also expect an 
overall genomic relationship between TCV and HCV- 
OC43, although this has to be further investigated. 

Only two nucleotide differences were found between 
the N protein sequences of TCV and BCV. The first is 
located towards the N terminus of the protein and results 
in an amino acid change from Ser in TCV to Phe in BCV 
at amino acid position 15 of the protein, when compared 
to the BCV sequence published by Lapps et al. (1987). 
However, Cruciere et al. (1988) reported a serine residue 
in the same position in another BCV strain, and this is 
also the case in HCV-OC43 (Kamahora et al., 1989). The 
second nucleotide difference at amino acid position 53 of 
the protein is a Gin in TCV and a Leu in BCV, which 
again is different in HCV-OC43 and MHV strains JHM 
and A59. 

An IORF of 624 nucleotides within the N gene is 
analogous to one in BCV; the corresponding region of 
HCV-OC43 contains two IORFs. The presence of 
IORFs, either inside the N gene or partially in the 3' 
non-coding region, which are often preceded by an 
AUG codon in a favourable context for translation 
initiation (Kozak, 1983), is frequently observed with 
other coronaviruses [i.e. BCV (Lapps et al, 1987), TGEV 
(Kapke & Brian, 1986), feline corona virus (de Groot et 
al., 1988), IBV (Boursnell et al., 1985), HCV-OC43 
(Kamahora et al., 1989) and HCV strain 229E (Schreiber 
et al., 1989)]. It is not yet known whether these IORFs 


are functional, but it is of interest to determine their 
possible translation products in virus-infected cells and 
to verify their translation by means of expression vectors. 

Close similarity was also observed between the TCV 
M protein amino acid sequence and the corresponding 
sequences of BCV (up to 100%, with a single amino acid 
difference at position 50) and MHV (86%). The expected 
membrane topology of the M protein would therefore be 
likely to resemble the model proposed by Rottier and 
collaborators (Armstrong et al., 1984; Rottier et al., 
1986). Most of the basic amino acids are situated in the 
C-terminal half of the protein and might, therefore, 
interact with the negatively charged RNA and the acidic 
residues of the N protein as suggested by Sturman et al. 
(1980). 

The N and M protein and intergenic sequences of 
TCV-Minnesota were up to 100% the same as the 
sequence of BCV. Therefore, we envisaged that (i) we 
might have worked with a laboratory-created recombin¬ 
ant virus, (ii) the HRT-18 cells, which are of human 
origin and used for the production of all our BCV and 
TCV isolates, might conceal a latent infection with a 
closely related human coronavirus that could have been 
activated upon infection with another coronavirus and 
(iii) the inoculum may have been contaminated with 
BCV. The second possibility was not likely as hybridiza¬ 
tion assays with BCV-specific probes on supernatants or 
nucleic acid from mock-infected HRT-18 cells (Verbeek 
& Tijssen, 1988), did not reveal any indication of this and 
neither did amplification by PCR using nucleic acid 
from mock-infected cells (Fig. 4). In order to rule out all 
three possibilities, RNA from different TCV-positive 
clinical specimens was isolated for cDNA synthesis and 
amplification by PCR of fragments corresponding to the 
N terminus or the complete translational reading frame 
of the M protein (Fig. 4). The addition of tRNA to the 
samples, as a co-precipitator and a competitor for 
RNases was essential for the isolation of RNA templates 
suitable for PCR. 

Sequence analysis of the cloned fragments from 
clinical samples showed that the single amino acid 
difference between the TCV and BCV M sequences at 
position 50 of the protein was found again in the N- 
terminal translated sequence of the amplified 223 bp 
fragment from clinical isolate number 5 (Fig. 5). This 
difference was not observed in the corresponding 
fragments of any other isolate (Fig. 5). Complete identity 
was also found between the TCV M protein sequence, 
which was amplified from clinical specimen 6 (Fig. 4), 
and the sequence of BCV (Fig. 5). The data obtained 
from these experiments are fully consistent with the M 
nucleotide sequence obtained for the TCV strain 
Minnesota. 

The only remaining ambiguity is the type of glycosyla- 
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tion of the M protein which is N- and 0-linked in TCV 
(Dea et al., 1990) and BCV (Lapps et al., 1987), 
respectively, although the nucleotide sequence of this 
region is the same in both viruses. In addition to the 
possible O-glycosylation sites there is one potential site 
for V-glycosylation (Asn-Phe-Ser) within the first 28 N- 
terminal residues of the TCV M protein and which is, by 
maximum alignment of the amino acid sequences of 
HCV-OC43, TGEV, MHV, BCV and IBV, found at the 
same position in MHV, BCV and TGEV, but not in IBV 
(Fig. 6). The latter two viruses, in which the M proteins 
have V-linked oligosaccharide side-chains, possess an 
additional one (TGEV) or two (IBV) potential N- 
glycosylation sites further upstream, while HCV-229E 
has one potential N-terminal glycosylation site at 
another position (Fig. 6). Whether the single N- 
glycosylation site is glycosylated in TCV remains to be 
seen. 

The data presented here show that TCV is extremely 
closely related to BCV to the point that they cannot be 
distinguished on the basis of the nucleotide sequences so 
far known. There is expected to be an overall genomic 
homology as BCV probes to different genomic locations 
are efficient in detecting TCV isolates (unpublished 
results). We have also succeeded in amplifying the 
complete TCV S gene (data not shown), using primer 
combinations selected from a recently published BCV S 
gene sequence (Boireau et al., 1990). It is expected that 
these genes may contain more differences because 
monoclonal antibodies against the S protein enabled 
differentiation of the two viruses (Dea et al., 1990). 

Interestingly, although TCV, BCV and HCV-OC43 
must have only recently diverged from each other, they 
possess different target cell specificities in vitro and infect 
different animal species. HCV-OC43 causes mainly 
respiratory diseases, whereas BCV and TCV affect the 
gastrointestinal system. However, the pathogenicity of 
TCV and BCV isolates for turkey poults (unpublished 
results), as well as their c.p.e. in vitro are different (Dea et 
al., 1990). Sequence analysis of the S protein responsible 
for cell attachment should reveal differences concerning 
regions important for host cell specificity. The predicted 
amino acid sequence homology between TCV and BCV, 
thus far analysed, supports our proposal for the reclassifi¬ 
cation of TCV, which was previously based mainly on 
their antigenetic relatedness (Dea et al., 1990). 
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